================================================================================
REPLICATION PACKAGE
Sex-Disaggregated Citizenship Statistics: Data Gaps and Why These Matter
================================================================================

Authors: Ashley Mantha-Hollands and Maarten Vink
Affiliation: European University Institute
Status: Manuscript under review
Date: January 2026

================================================================================
CONTENTS
================================================================================

This replication package contains all data and code necessary to reproduce 
the figures in the manuscript. The package includes:

1. dataviz_replication.R          - R script to reproduce all figures
2. UK-citizenship-datasets-sep-2025.xlsx - UK Home Office citizenship data
3. destatis_2024.xlsx             - German citizenship data (2024)
4. destatis_2000_2023.xlsx        - German citizenship data (2000-2023)
5. px-x-0103030100_101.px	  - Swiss citizenship data
6. README.txt                     - This file

================================================================================
SOFTWARE REQUIREMENTS
================================================================================

R version 4.0 or higher is required. The following R packages must be installed:

- readxl
- dplyr
- stringr
- tidyverse
- eurostat
- janitor
- patchwork
- scales
- countrycode
- BFS
- pxR

To install all required packages, run:
install.packages(c("readxl", "dplyr", "stringr", "tidyverse", "eurostat", 
                   "janitor", "patchwork", "scales", "countrycode", "BFS", "pxR"))

================================================================================
DATA SOURCES
================================================================================

1. EUROSTAT DATA (Figure 1)
   - Downloaded automatically via the eurostat R package
   - Dataset: migr_acqs (citizenship acquisition statistics)
   - No manual download required

2. GERMAN DATA (Figure 2a)
   - Source: DESTATIS (German Federal Statistical Office)
   - Two files included:
     * destatis_2000_2023.xlsx: Historical data (2000-2023)
     * destatis_2024.xlsx: Latest year data (2024)
   
   Note: These files were manually downloaded from:
   - 2000-2023: https://www-genesis.destatis.de/datenbank/online/statistic/12511/table/12511-0006
   - 2024: https://www-genesis.destatis.de/datenbank/online/statistic/12511/table/12511-0007
   
   See lines 111-129 in dataviz_replication.R for detailed download instructions.

3. UK DATA (Figure 2c)
   - Source: UK Home Office
   - File: UK-citizenship-datasets-sep-2025.xlsx
   - Contains naturalisation data by sex and application type

4. SWITZERLAND DATA (Figure 2b)
   - Downloaded automatically via the BFS R package
   - Asset number: 36074743
   - Source: Swiss Federal Statistical Office
   - No manual download required but data can be manually imported into session with file px-x-0103030100_101.px included in the replication package

================================================================================
INSTRUCTIONS FOR USE
================================================================================

STEP 1: Set up your working directory
   - Create a new folder for this replication package
   - Place all files from this package in that folder
   - Set this folder as your working directory in R using:
     setwd("path/to/your/folder")

STEP 2: Install required packages
   - Run the installation command listed in SOFTWARE REQUIREMENTS above
   - This only needs to be done once

STEP 3: Run the replication script
   - Open dataviz_replication.R in RStudio or your preferred R environment
   - Run the entire script, or run sections individually to reproduce specific figures

STEP 4: Review outputs
   The script will generate the following figure files:
   
   - Fig.nat_rates_sex_country_mean.jpeg  (Figure 1)
     Mean citizenship acquisition rates across 33 European countries, 2009-2023
   
   - Fig2a_DE_marriage_share_by_sex.png   (Figure 2a)
     Share of marriage-based naturalisations in Germany by sex
   
   - Fig2b_CH_marriage_share_by_sex.png   (Figure 2b)
     Share of marriage-based naturalisations in Switzerland by sex
   
   - Fig2c_UK_marriage_share_by_sex.png   (Figure 2c)
     Share of marriage-based naturalisations in UK by sex
   
   - Fig2_UK_CH_DE_marriage_share_by_sex.png  (Figure 2 combined)
     Combined plot showing all three countries

================================================================================
EXPECTED RUNTIME
================================================================================

The complete script should run in approximately 2-5 minutes on a standard 
desktop computer, depending on internet connection speed (for automated 
data downloads) and system specifications.

================================================================================
FIGURE DESCRIPTIONS
================================================================================

FIGURE 1: Eurostat naturalisation rates by sex
   - Displays mean citizenship acquisition rates for women and men across 
     33 European countries for the period 2009-2023
   - Data source: Eurostat (migr_acqs)
   - Format: JPEG (20 x 8 inches, 400 dpi)

FIGURE 2: Marriage-based naturalisation shares by sex
   - Three-country comparison (Germany, Switzerland, UK) showing the 
     proportion of naturalisations granted through marriage-based pathways
   - Separate panels for women and men
   - Includes mean share lines for each sex-country combination
   
   Figure 2a (Germany): 
   - Note: German statistics include spousal transfer (§9 StAG) and 
     co-naturalisation by spouses and minor children (§10 Abs.2 StAG)
   
   Figure 2b (Switzerland):
   - Compares "Simplified naturalisation" vs "Ordinary naturalisation"
   
   Figure 2c (UK):
   - Compares "Naturalisation based on marriage" vs 
     "Naturalisation based on residence"

================================================================================
NOTES AND CAVEATS
================================================================================

1. German data requires manual download. If you need to update the German 
   data, follow the detailed instructions in lines 111-129 of the R script.

2. The script filters for adults (age 18+) in the Switzerland data.

3. Some German data points contain "e" markers (indicating estimated values) 
   which are converted to NA in the cleaning process.

4. UK data is filtered to include only "Female" and "Male" categories, 
   excluding any "Total" or other aggregate categories.

5. Internet connection required for:
   - Downloading Eurostat data (Figure 1)
   - Downloading Swiss data (Figure 2b)

6. The script uses greyscale colour schemes for all plots (grey30 and grey70).

================================================================================
TROUBLESHOOTING
================================================================================

Problem: Package installation errors
Solution: Ensure you have the latest version of R installed. Try installing 
          packages individually if batch installation fails.

Problem: "File not found" errors
Solution: Verify that all data files are in your working directory and that 
          you have set the working directory correctly using setwd().

Problem: Eurostat data download fails
Solution: Check your internet connection. The eurostat package may occasionally 
          experience server issues; try again later if problems persist.

Problem: Swiss data download fails
Solution: Check internet connection and verify that the BFS package is 
          correctly installed and loaded.

Problem: Plots appear with incorrect formatting
Solution: Ensure all required fonts are available on your system. The script 
          uses standard R fonts which should be available on most systems.

================================================================================
CITATION
================================================================================

If you use this replication package, please cite:

Mantha-Hollands, Ashley and Maarten Vink. 2026. "Sex-Disaggregated Citizenship 
Statistics: Data Gaps and Why These Matter." Replication materials for manuscript under review. Harvard Dataverse, https://doi.org/10.7910/DVN/NL0M4Z.

================================================================================
CONTACT INFORMATION
================================================================================

For questions or issues related to this replication package, please contact:

maarten.vink@eui.eu

================================================================================
VERSION HISTORY
================================================================================

Version 1.0 (January 2026)
- Initial release for manuscript submission

================================================================================
LICENSE
================================================================================

Data sources retain their original licenses:
- Eurostat data: Free to use with attribution
- DESTATIS data: Licensed under the Data Licence Germany
- UK Home Office data: Open Government Licence
- Swiss Federal Statistical Office data: Free to use with attribution
All sources cited in the paper.
================================================================================
